The practical effect of batch on genomic prediction.
نویسندگان
چکیده
Measurements from microarrays and other high-throughput technologies are susceptible to non-biological artifacts like batch effects. It is known that batch effects can alter or obscure the set of significant results and biological conclusions in high-throughput studies. Here we examine the impact of batch effects on predictors built from genomic technologies. To investigate batch effects, we collected publicly available gene expression measurements with known outcomes, and estimated batches using date. Using these data we show (1) the impact of batch effects on prediction depends on the correlation between outcome and batch in the training data, and (2) removing expression measurements most affected by batch before building predictors may improve the accuracy of those predictors. These results suggest that (1) training sets should be designed to minimize correlation between batches and outcome, and (2) methods for identifying batch-affected probes should be developed to improve prediction results for studies with high correlation between batches and outcome.
منابع مشابه
Effect of marker density and trait heritability on the accuracy of genomic prediction over three generations
The aim of this study was to determine the effect of marker density, level of heritability, number of QTLs, and size of training set on the genomic accuracy over three generations. Thereby, a trait was simulated with heritability of 0.10, 0.25 or 0.40. For each animal, a genome with 20 chromosomes, 1 Morgan each, was simulated. Different marker densities (2000, 4000 and 6000 markers) and 400 an...
متن کاملImputation of parent-offspring trios and their effect on accuracy of genomic prediction using Bayesian method
The objective of this study was to evaluate the imputation accuracy of parent-offspring trios under different scenarios. By using simulated datasets, the performance Bayesian LASSO in genomic prediction was also examined. The genome consisted of 5 chromosomes and each chromosome was set as 1 Morgan length. The number of SNPs per chromosome was 10000. One hundred QTLs were randomly distributed a...
متن کاملPrediction of the Operating Conditions in a Batch Distillation Column Using a Shortcut Method
A shortcut procedure as quick, easy-to use method for design and simulation of multicomponent batch distillation is used to predict the operating condition of recovering xylene from solvent in an existing batch distillation column in benzol refinery. The procedure can be used to investigate the effect of the operating parameters on the operation of column for three possible modes of batch d...
متن کاملAccuracy of Genomic Prediction under Different Genetic Architectures and Estimation Methods
The accuracy of genomic breeding value prediction was investigated in various levels of reference population size, trait heritability and the number of quantitative trait locus (QTL). Five Bayesian methods, including Bayesian Ridge regression, BayesA, BayesB, BayesC and Bayesian LASSO, were used to estimate the marker effects for each of 27 scenarios resulted from combining three levels for her...
متن کاملRemoving batch effects for prediction problems with frozen surrogate variable analysis
Batch effects are responsible for the failure of promising genomic prognostic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to remove these artifacts, but they are designed to be used in population studies. But genomic technologies are beginning to be used in clinical applications where sam...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Statistical applications in genetics and molecular biology
دوره 11 3 شماره
صفحات -
تاریخ انتشار 2012